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A  new  relaxation  algorithm  for  circuit  simulation  that  combines  the  advantages  of  iterated 
timing  analysis  (ITA)  and  waveform-relaxation  (WR)  is  described.  The  method  is  based  on 
using  an  iterative  step-size  refinement  strategy  with  a  waveform-relaxation-Newton  (WRN) 
algorithm.  All  three  relaxation  techniques,  ITA,  WR,  and  WRN,  are  compared  and 
experimental  results  that  indicate  the  strengths  and  weaknesses  of  the  methods  are 
presented.  In  addition,  a  new  convergence  proof  for  the  waveform-Newton  method  for 
systems  with  nonlinear  capacitors  is  provided.  Finally,  it  is  shown  that  the  step-refined 
WRN  algorithm  can  be  implemented  on  a  parallel  processor  in  such  a  way  that  not  only 
can  different  subsystems  be  processed  in  parallel  but,  in  addition,  the  solution  at  different 
timepoints  of  the  same  subsystem  can  be  computed  in  parallel. 
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Abstract 

A  new  relaxation  algorithm  lor  circuit  simulation  that  combine*  the  advantages  of  Iterated  Timing 
Analysis  (ITA)  and  waveform  relaxation  (WR)  is  described.  The  method  is  based  on  using  an  iterative 
stepsise  refinement  strategy  with  a  waveform-  relax*  tion-Newton  (WRN)  algorithm.  All  three  relaxation 
techniques,  ITA,  WR,  and  WRN,  are  compared  and  experimental  results  that  indicate  the  strengths  and 
weaknesses  of  the  methods  are  presented.  In  addition,  a  new  convergence  proof  for  the  waveform-Newton 
method  for  systems  with  nonlinear  capacitors  is  provided.  Finally,  it  is  shown  that  the  step-refined  WRN 
algorithm  can  be  implemented  on  a  parallel  processor  in  such  a  way  that  not  only  can  different  subsystems 
be  processed  in  parallel  but,  in  addition,  the  solution  at  different  timepoints  of  the  same  subsystem  can 
be  computed  in  parallel. 

1  Introduction 

The  implicit  multistep  integration  algorithms  used  in  general-purpose  circuit  simulation  programs,  like 
SPICE2[NAG75]  and  ASTAP[WEE73],  have  proven  to  be  reliable  but  are  computationally  expensive  when 
applied  to  large  systems.  This  is  because  each  step  of  the  numerical  integration  requires  the  implicit  solution 
of  a  large  nonlinear  algebraic  system.  If  the  circuit  simulation  program  is  intended  for  the  simulation  of 
mostly  MOS  digital  circuits,  then  it  is  possible  to  exploit  the  properties  of  these  types  of  circuits  to  improve 
the  simulator’s  efficiency.  In  particular,  the  fact  that  MOS  digital  circuits  can  be  partitioned  into  loosely  or 
unidirectionally  coupled  subsystems  can  be  exploited  by  using  iterative  decomposition  algorithms,  and  the 
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fact  that  the  different  node*  in  an  MOS  digital  circuit*  often  change  at  very  different  rates  can  be  exploited 
by  using  multirate  integration  techniques(GEA80]. 

A  variety  of  algorithms  have  been  applied  to  the  simulation  of  MOS  digital  circuits  that  attempt  to  ex¬ 
ploit  its  loosely  coupled  and  multirate  nature(for  a  comprehensive  list  of  references  see[NEW83]  or  [WHI86]). 
We  will  focus  on  three  relaxation- based  methods,  the  Iterated  Timing  Analysis  (ITA)  method  used  in  the 
SPLICE  programs  [SAL83,  KLE84,  SAL87A],  the  waveform  relaxation(WR)  algorithm  used  in  the  RE¬ 
LAX  programs[LEL82,  WHI86],  and  a  third  new  algorithm  based  on  combining  a  waveform-Newton  algo- 
rithm[WHI86,  GUA85]  with  an  iterative  refinement  timestep  selection  strategy. 

Section  2  begins  with  a  brief  description  of  the  classical  circuit  simulation  algorithms  used  in  programs 
like  SPICE  and  ASTAP.  In  Section  3,  the  Iterated  Timing  Analysis  and  waveform  algorithms  are  presented, 
along  with  a  short  discussion  of  their  relative  merits.  In  Section  4,  the  waveform  relaxation-Newton  (WRN) 
algorithm  and  the  iterative  refinement  strategy[WHI85]  for  selecting  the  numerical  integration  timesteps  are 
described,  and  the  advantages  of  the  combined  algorithm  on  a  parallel  processor  are  mentioned.  In  Section 
5,  a  brief  description  of  the  implementation  of  the  relaxation  algorithms  is  given,  and  experimental  results 
that  indicate  the  strengths  and  weaknesses  of  the  three  techniques  are  presented. 

2  Classical  Circuit  Simulation 

In  this  section  we  will  describe  the  direct  methods  used  in  circuit  simulation  programs  like  SPICE[NAG75] 
or  ASTAP[WEE73]-  These  methods  are  referred  to  as  standard  or  classical  methods,  partly  because  SPICE 
and  ASTAP  are  extensively  used,  and  partly  because  in  this  area,  any  method  more  than  ten  years  old  is 
considered  to  be  a  classical  method. 

The  behavior  of  a  noninductive  electrical  circuit  can  be  described  by  a  system  of  equations  involving 
node  voltages,  resistive  currents,  and  capacitor  charges.  This  system  of  equations  can  be  constructed  from 
a  circuit  description  using  the  technique  referred  to  as  nodal  anaiysis{DES69},  which  involves  applying  the 
Kirchoff  Current  Law  (KCL)  to  each  node  in  the  circuit,  and  applying  the  constitutive  or  branch  equations 
to  each  circuit  element.  Systems  of  equations  so  generated  have  the  following  form, 

u(0)  =  «(<))  (1) 

where  n  is  the  number  of  circuit  nodes  excluding  the  reference,  q(v(t),  u(t))<JR"  is  the  vector  of  sums  of 
capacitive  charges  at  each  node,  y(i>(<),  u(<))f2R"  is  the  vector  of  sums  of  resistive  currents  at  each  node, 
u is  the  vector  of  input  voltages,  and  »(<)<!?"  is  the  vector  of  node  voltages. 


It  is  possible  to  extend  the  nodal  analysis  technique  to  include  circuits  with  inductors  and  voltage  sources 
by  using  modified  nodal  analysis[H07S],  while  still  preserving  the  form  of  (1)[WHI86].  The  unknowns  then 
become  a  mixture  of  node  voltages  and  element  currents,  and  the  left-hand  side  of  (1)  becomes  a  vector  of 
charges  and  inductor  fluxes.  In  order  to  simplify  the  notation  used  in  this  paper,  we  will  refer  to  the  above 
nodal  analysis  form  with  only  node  voltages  as  the  unknowns.  However,  many  of  the  comments  made  in  the 
following  about  these  systems  apply  to  any  system  in  the  form  of  (1). 

The  trapes oid al  numerical  integration  algorithm  is  frequently  used  to  approximate  the  system  of  (1)  by  a 
sequence  of  implicit  algebraic  equations.  Given  a  timestep,  A,  the  trapesoidal  integration  algorithm  applied 
to  (1)  yields: 


q(v(t  +  A),  u(<  +  A))  -  q(w(<),«(0)  *  0.5A[*(v(t  +  A),  u(t  +  A))  +  p(i»(t), u(t))]  (2) 

where  v(t)  is  known,  and  the  equation  must  be  solved  to  compute  v(t  +  A).  The  iterative  Newton-Raphaon 
algorithm  is  usually  used  to  solve  the  implicit  nonlinear  algebraic  system  given  by  (2).  The  Newton-Raphaon 
iteration  equation  for  solving  a  general  nonlinear  system  of  the  form  F(x)  =  0  is  7r(**)(jt*+1  -**)  =  -F(xk) 
where  F(x*)  is  referred  to  as  the  Newton  residue  and  Jf  is  the  Jacobian  matrix  of  F  with  respect  to  x. 

If  the  Newton  algorithm  is  used  to  solve  (2)  for  v(t  4-  A),  the  residue,  F(vk(t  +  A)),  is: 

F(vk(t  +  A))  =  q(vk(t  4-  A),  u(t  +  A))  -  ,(v(t). u (t))  -  0.5A(p(t;*(<  4-  A),  u(t  4-  A))  +  *(*(<),  «(*)))  (3) 

and  the  Jacobian  of  F{vk(t  4-  A)),  JF(vk{t  4-  A))  is: 

■>,(»»« + »> = oic‘«  + »» _  oMite-i'.*  gMj+m.  «) 

(JV  ov 

Then  v*+1(t  +  A)  is  derived  from  vk(i  +  A)  by  solving  the  linear  system  of  equations 

JF(vk(t  +  A))[t>*+l(f  +  A)  -  vk{t  +  A)]  =  -F(v*(t  +  A))  (5) 

using  some  form  of  Gaussian  elimination.  The  Newton  iteration  is  continued  until  sufficient  convergence  is 
achieved,  that  is  ||®*+1(f  +  A)  -  »*((  4-  A)||  <  t  and  F(v*(t  4-  A))  is  dose  enough  to  iero. 

3  Relaxation  In  Circuit  Simulation 

As  is  clear  from  (S),  using  classical  techniques  to  simulate  large  circuits  implies  solving  large  linear  systems, 
usually  with  some  form  Gaussian  elimination.  In  general,  matrix  solution  by  Gaussaa  elimination  grows 
in  complexity  like  n3  where  n  is  the  number  of  unknowns  in  the  system.  This  rapid  growth  in  complexity 
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haa  focused  attention  on  finding  other  approaches  for  solving  large  linear  systems.  Since  the  matrices  that 
describe  the  linear  systems  in  circuit  simulation  problems  are  usually  sparse,  that  is,  most  of  matrix  entries 
are  sero,  clever  data  structures  and  careful  reordering  algorithms  have  been  used  to  reduce  the  complexity  of 
Gaussian  elimination  for  sparse  matrices  to  less  than  na[KUN86).  However,  sparse  matrix  techniques  based 
on  Gaussian  elimination  still  grow  superlinearly  with  the  number  of  unknowns,  and  therefore  researchers 
have  continued  to  seek  matrix  solution  methods  that  do  not  grow  as  rapidly. 

Relaxation  methods  comprise  one  class  of  iterative  matrix  solution  methods  that  tend  to  have  a  slower 
computational  growth  than  sparse  Gaussian  elimination.  In  particular,  the  computation  per  iteration  grows 
linearly  with  the  number  of  nonzero  matrix  entries.  The  disadvantage  of  relaxation  is  that  each  iteration  only 
produces  an  approximate  solution  to  a  linear  system,  and  repeated  relaxation  iterations  do  not  necessarily  im¬ 
prove  the  approximation.  For  circuit  simulation  problems,  relaxation  methods  work  well  because,  in  general, 
repeated  relaxation  iterations  do  converge  to  the  exact  solution[NEW83].  As  an  example,  consider  solving 
the  linear  problem  Ax  -  6  =  0  where  x  =  (xi,  ...,x„)T,h  =  (6i,  ...,6„)T,Xi,6i<2R,  and  A  =  («<,-),  AtSf'xn. 
One  could  attempt  to  solve  this  problem  by  solving  the  equations  one  row  at  time,  guessing  values  for  the 
Xi  s  that  have  not  been  computed.  This  leads  to  Algorithm  1,  the  Gauss-Seidel  relaxation  algorithm,  which 
will  be  referred  to  several  times  in  the  following  text. 

Algorithm  1  -  Gauss-Seidel  Algorithm  for  solving  Ax  —  b  =  0. 

The  superscript  k  is  the  iteration  count  and  e  is  a  small 
positive  Dumber. 
k  -0. 

Guess  some  F°. 
repeat  { 

*<-*  +  1. 

for  each  (i<{l, .., n})x?  =  £(6,  -  «<>**  +  £?=,+ 1 

}  until  (||x*  -  xk_l||  <  c) 

The  linear  Gauss-Seidel  relaxation  in  Algorithm  1  could  be  used  instead  of  Gaussian  elimination  to  solve 
the  linear  matrix  problem  of  (2),  and  this  might  reduce  the  computation  time  for  the  matrix  solution  in  a 
circuit  simulation  program.  However,  if  the  circuit  being  simulated  has  fewer  than  several  thousand  nodes, 
at  least  half  of  the  computation  time  is  spent  evaluating  the  Jacobian  and  the  Newton  residue.  For  this 
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reason,  other  relaxation  schemes  have  been  applied  to  circuit  simulation  problems,  ones  that  attempt  to 
both  avoid  Gaussian  elimination  and  reduce  the  time  to  evaluate  the  Jacobian  and  Newton  residue. 

Historically,  the  first  approach  to  using  relaxation  in  a  circuit  simulation  program  was  to  substitute  a 
relaxation  method  for  the  Newton  method  to  solve  the  implicit  nonlinear  algebraic  systems  generated  by 
the  discretisation  method.  Early  programs  based  on  this  approach  attempted  only  to  produce  approximate 
results,  and  performed  only  one  relaxation  iteration  for  each  timestep.  The  techniques  used  in  these  programs 
were  referred  to  as  ”  timing  analysis” ,  and  when  the  relaxation  iteration  was  carried  to  convergence  to  produce 
accurate  results,  the  algorithm  was  referred  to  as  Iterated  Timing  Analysis  (ITA)  [CHA75,  NEW83].  A 
second,  more  recently  developed  approach,  applied  the  relaxation  directly  to  the  differential  equation  system, 
replacing  solving  a  large  differential  equation  system  with  solving  a  collection  of  small  differential  equation 
subsystems[LEL82].  This  second  approach  is  referred  to  as  waveform  relaxation(WR)  because  the  iterates 
are  functions  or  waveforms  over  the  simulation  interval.  In  the  following  two  subsections  we  describe  the 
ITA  and  WR  approaches  in  more  detail. 


3.1  Iterated  Timing  Analysis 


The  above  linear  Gausa-Seidel  relaxation  “recipe”  can  be  applied  to  directly  to  solving  the  nonlinear  algebraic 
system  F(w(t  +  h))  *  0  given  in  (3).  The  nonlinear  system  can  be  rewritten  as  F(t>(t  +  h))  s  (fi(v(t  +  h)) 
, ...,  /„(»(<  +  A)))r  where  /,  :  JP*  -*  2R.  Following  the  Gauas-Seidel  approach  for  linear  systems,  at  each  step 
of  the  relaxation,  the  v<((  +  h)  element  is  updated  by  solving  the  implicit  algebraic  equation 

/•(v*+M(t  +  h))  =  0  (6) 


for  u?+\  where  =  (v{, ....  wf ,  tfo1,. ...v*-1)T. 

It  is  possible  to  use  the  N e  w  ton-Raphson  algorithm  to  solve  the  implicit  algebraic  system  of  (6)  accurately 
at  each  step,  and  algorithms  so  generated  are  referred  to  as  relaxstion-Newton  methods.  However,  in  this 
application,  the  Newton  iteration  is  not  usually  carried  to  convergence  as  it  has  been  shown  that  the  rate  of 
convergence  of  nonlinear  relaxation  is  not  reduced  if,  rather  than  solving  the  implicit  algebraic  systems  at 
each  step,  only  one  iteration  of  the  Newton  method  is  used  [ORT70].  Such  a  Gauss-Seidel-Newton  algorithm 
applied  to  systems  of  the  form  F(v)  =  0  is  (using  the  notation  of  (6)), 


*?+1(<  +  A)  =  v}(t  +  h)~  l9M'**l’£{t  +  h))‘) +  *)) 


(7) 


Applied  to  the  system  in  (3)  we  get 
v?+I  =  v}- 
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The  following  convergence  theorem  it  a  minor  modification  of  the  result  in  [ORT70,WHI86]. 

Theorem  1  Let  the  Gauss-SeideL Newton  relaxation  algorithm  it  need  to  solve  forv(t  +  h)  in  (S).  If  g(v,u) 
i*  continuously  differentiable  with  respect  to  v  uniformly  in  u,  u(f)  is  continuous  with  respect  to  t,  u) 
is  strictly  or  irreducihly  diagonally  dominant  uniformly  over  all  v  and  u,  and  v(t)  +  O(h)  (where  O(h)  is 
continuous  in  A  and  lirru—o  0(h)  =  0 )  is  used  as  the  starting  point  for  the  relaxation,  then  there  exists  an  h 
such  that  for  all  hm  <  h  the  relaxation  converges  to  the  solution  of  (3)  ■. 

The  relaxation- N ewton  methods  are  popular  for  circuit  simulation  problems  for  two  reasons.  The  first  is 
that  for  a  broad  class  of  circuits  the  capacitance  matrix,  given  as  |*(v,  u),  is  strictly  diagonally  dominant. 
This  is  the  case  if  the  circuit  contains  two-terminal  positive  capacitors  (linear  or  nonlinear),  or  any  other 
elements  whose  charge  function  has  a  diagonally  dominant  Jacobian  with  positive  diagonal  entries,  and  there 
exists  some  nonzero  capacitance  to  ground  or  a  voltage  source  at  each  node  in  the  circuit.  Therefore,  for 
these  circuits,  the  relaxation-N  ewton  algorithms  are  guaranteed  to  converge  if  the  integration  timestep  is 
made  small  enough. 

The  second  reason  for  the  popularity  of  relaxation-Newton  methods  is  that  with  proper  application  they 
can  be  used  both  to  avoid  solving  a  large  matrix  problem  by  Gaussian  elimination  and  to  avoid  computing 
portions  of  the  Newton  residue  and  the  Jacobian.  As  can  be  seen  in  from  (7),  only  the  diagonal  terms  of 
the  Jacobian,  |£,  need  be  computed.  In  addition,  the  computation  of  the  Newton  residue  and  the  Jacobian 
can  be  further  reduced  using  two  bypassing  schemes  [SAL83]  that  skip  the  computation  of  portions  of  the 
Jacobian  and  the  residue.  One  type  of  bypassing  is  implemented  by  noting  whether  the  components  of  v 
on  which  /,  depends  have  changed  significantly  in  a  relaxation-Newton  iteration,  and  if  none  of  them  have, 
then  /,  is  not  re-evaluated.  Bypassing  in  time  can  also  be  implemented  by  checking  if  is  close  enough  to 
0  and,  if  so,  then  v,(f  +  h)  will  be  equal  to  «<(()  and  therefore  v<  need  not  be  recomputed  at  t  +  h.  Note  that 
the  effect  of  bypassing  in  time  is  to  exploit  waveform  latency,  that  is,  if  a  node  in  the  circuit  is  not  changing, 
then  the  contributions  to  the  Jacobian  and  Newton  residue  of  the  devices  connected  to  that  node  need  not 
be  computed  at  subsequent  points  in  time. 

3.2  Waveform  Relaxation 

ThejsMssical  approach  and  the  ITA  approach  without  latency  exploitation  suffer  a  loss  of  efficiency  because 
they  apply  the  integration  method  to  the  entire  system  simultaneously.  This  forces  every  differential  equation 


in  the  system  to  be  discretized  identically,  and  thia  discretisation  muat  be  fine  enough  ao  that  the  fastest- 
c hanging  state  variable  in  the  system  is  accurately  represented.  If  it  were  possible  to  pick  different  timesteps 
for  each  differential  equation  in  the  system,  so  that  each  could  use  the  largest  timestep  that  would  accurately 
reflect  the  behavior  of  its  associated  state  variable,  then  the  efficiency  of  the  simulation  would  be  greatly 
improved.  This  is  referred  to  as  the  multirate  problem[GEA80],  and  numerical  integration  methods  that 
allow  for  different  state  variables  to  use  different  timesteps  are  called  multirate  integration  methods. 

Circuit  simulation  programs  that  use  ITA  with  latency  exploitation  do  exhibit  some  of  the  characteristics 
of  a  multirate  integration  method.  If,  at  a  given  timestep,  the  variable  is  at  its  equilibrium  (or  stationary) 
point,  and  the  Vj  variables  on  which  i>,  depends  do  not  change,  then  v<  will  retain  the  value  it  had  before  the 
timestep.  In  fact,  v*  will  never  be  recomputed  until  some  vj  on  which  it  depends  changes.  If  v<  is  bypassed 
for  several  timesteps  the  effect  is  the  same  as  if  a  large  timestep  were  used  to  compute  v<.  Therefore,  a 
bypassing  algorithm  exploits  the  kind  of  multirate  behavior  that  stems  from  a  system  in  which  most  of  of 
the  variables  remain  at  equilibrium.  The  ITA  approach  with  latency  exploitation  does  not,  however,  take  full 
advantage  of  a  system  for  which  the  state  variables  have  different  rates  of  motion  but  are  not  at  equilibrium. 

It  is  possible  to  create  a  full  multirate  integration  method  for  solving  (1)  by  applying  the  Gauas-Seidel 
recipe  for  relaxation  directly  to  the  differential  equation  system  before  introducing  discrete  approximations. 
If  such  an  approach  were  applied  to  (1)  then  at  each  step  of  the  relaxation,  the  waveform  u,  (t)  for  all  te[0,T] 
is  updated  by  solving  the  differential  iteration  equation 

^.V+1 <+1(t), «(/))  -  «.(!))  =  0  t.*+1(0)  =  vi0  (9) 

for  vf+1(t)  on  [0,  T\.  In  this  approach,  each  differential  equation  in  system  is  solved  independently,  and 
each  can  use  an  independent  set  of  integration  timesteps.  Therefore,  the  waveform  relaxation  algorithm  is 
inherently  a  multirate  integration  method. 

The  convergence  of  WR  is  guaranteed  for  most  examples  of  practical  interest  by  the  following  theorem 
(1)  [LEL82,  WHI86]. 

Theorem  2  If,  in  a  system  of  the  form  of  (I),  g(v,  u)  is  Lipsckit*  continuous  with  respect  to  v  uniformly  in 
u,u(t)  is  continuous  with  respect  to  t  and  |$(ti,u)  is  strictly  or  irreducibly  diagonally  dominant  uniformly 
over  allv  and  u,  Men  Me  Gauss-Seidel  waveform  relaxation  algorithm  applied  to  such  a  system  will  converge 
to  the  solution  to  the  system  ■. 

Theorem  2  guarantees  the  convergence  of  WR  for  the  continuous  case,  but  does  not  directly  apply  if 
a  discretization  integration  formula  is  used  to  solve  for  the  iteration  waveforms.  It  is  possible  to  prove 
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a  convergence  theorem  for  the  discretized  WR  algorithm,  but  only  by  assuming  some  conditions  on  the 
timesteps.  In  particular,  one  can  show  that  if  multistep  methods  are  used  to  solve  the  iteration  equations, 
then  the  discretized  WR  algorithm  converges  provided  the  integration  timesteps  are  smaller  than  some 
bound[NEV86].  Not  surprisingly,  the  bound  on  the  integration  timesteps  to  insure  WR  convergence  is 
similar  to  the  bound  on  the  timesteps  required  to  insure  the  ITA  algorithm  above  converges.  In  fact,  for 
linear  systems  and  nonmultirate  timesteps,  the  timestep  constraints  for  ITA  and  waveform  relaxation  are 
identical[WH  186] . 

4  Waveform-Relaxation-Newton  Algorithm 

The  advantage  of  the  standard  WR  algorithm  is  that  it  fully  exploits  multirate  behavior  by  solving  the 
differential  equations  in  a  decoupled  fashion.  However,  each  iteration  involves  solving  nonlinear  differential 
equations  accurately,  and  is  therefore  computationally  expensive.  Although  ITA  can  not  fully  exploit  multi¬ 
rate  behavior  easily[SAL87A],  it  is,  in  many  cases,  more  efficient  than  standard  WR.  This  is  partly  because 
each  relaxation  iteration  for  the  ITA  method  is  much  cheaper,  as  the  nonlinear  relaxation  iteration  equations 
are  solved  approximately  with  a  single  Newton  iteration.  Perhaps  it  is  the  obvious  next  step  to  try  a  similar 
approach  with  the  standard  WR  algorithm.  That  is,  use  a  single  iteration  of  some  kind  of  waveform-Newton 
algorithm  to  approximately  solve  the  WR  iteration  equations. 

It  is  reasonably  straight-forward  to  derive  the  waveform-relaxation-Newton  (WRN)  algorithm  [GUA83, 
BOK83]  and  show  that  WRN  has  similar  convergence  properties  to  standard  WR.  In  addition,  the  iteration 
equations  for  WRN  are  time-varying  linear  differential  equations  and  are  easier  to  solve  than  the  nonlinear 
differential  iteration  equations  of  WR.  However,  WRN  does  not  prove  to  be  much  more  efficient  than  WR 
when  applied  to  simulating  most  digital  circuit  examples,  because  whereas  WR  usually  converges  in  fewer 
than  five  iterations,  and  WRN  may  take  many  more. 

The  fact  that  the  WRN  algorithm  may  take  many  iterations  to  converge  makes  it  possible  to  improve  the 
algorithm’s  efficiency  by  solving  the  iteration  equations  crudely  at  first,  and  then  increasing  the  accuracy 
with  each  iteration.  With  WRN,  it  is  particularly  efficient  to  use  coarse  timesteps  in  the  early  iterations,  and 
then  refine  as  the  iterations  approach  convergence.  Note  that  such  a  technique  is  not  so  helpful  with  WR 
because  WR  usually  converges  so  rapidly  that  there  are  not  enough  iterations  to  do  meaningful  refinement. 
Also,  the  Newton  method  used  at  each  timestep  in  the  WR  iterations  may  not  converge  if  very  large  timesteps 
are  used. 

In  this  section,  a  WRN  algorithm  that  incorporates  timestep  refinement  is  described.  In  Section  4.1  the 
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waveform-Newton  (WN)  [K  AN64]  is  derived  and  some  of  its  limitations  are  described.  In  Section  4.2,  WN  is 
combined  with  WR  to  produce  the  waveform-relaxation-Newton  (WRN)  algorithm.  The  iterative  timestep 
refinement  strategy  used  with  WRN  is  described  in  Section  4.3,  and  its  suitability  for  parallel  computation 
is  mentioned  in  Section  4.4. 

4.1  Derivation  of  the  Waveform  Newton  Algorithm 

In  order  to  derive  the  waveform-Newton  algorithm  it  is  helpful  to  think  in  general  terms.  Finding  a  solution 
to  (1)  can  be  thought  of  as  finding  a  v  such  that  F(v)  —  0,  where  v  €  w,  the  space  of  continuous  and 
differentiable  functions  that  map  the  interval  [0,  T]  into  ZR",  and  F  is  a  function  derived  from  (1)  that  takes 
w  — ►  w.  The  iteration  equation  for  the  Newton- Raphson  algorithm  for  finding  a  solution  v  such  that 
F(v)  =  0  is, 

Jr(vk)[vk*1  =  -F(vk)  (10) 

where  Jr( v)  is  the  Frechet  derivative  of  F(v)  with  respect  to  v.  The  Frechet  derivative  is  defined  [ORT70] 
as  the  linear  operator  on  w  such  that 

lim  jj~||F(»  +  <)  -  F(v)  -  Jf(v)6||  =  0.  (11) 

A a  mentioned  above,  the  function  F  is  derived  from  (l)  so  that  F(v)  =  0  implies  that  v  solves  (1).  A 
pointwise  in  (  description  of  F  is  then  given  by 

(F(v))(t)  =  if (®(l),  u(t))  -  j(v(t)).  (12) 

This  pointwise  description  of  F,  and  the  definition  of  the  Frechet  derivative  given  above,  can  be  used  to 
evaluate  the  derivative  of  F,  denoted  as  Jr-  A  convenient  norm  on  w  in  which  to  evaluate  Jr(v)  from  (11) 
and  (12)  is 

IMI  =  IM0)r  +  m«Tr[0,«)||t’(r)||,r\  (13) 

where  m,  v(t)  is  the  usual  derivative  of  v(t)  with  respect  to  t,  and  ||  ((**  is  any  norm  on  Rn. 

To  begin,  consider  that 

(F(v  +  «))(<)  -  (F(v))(<)  =  &[?(v(t)  +  d«),  «(<))  -  ?(«(<).  «(*))]  ~  W*(«)  +  6(t)<  «(<))  "  H(v(t),  ti(0)] 
u(0)  =  w0  <(0)  =  0.  (14) 

Expanding  and  approximating  to  order  (||6||J, 

(F(v  +  6))(t)  -  (F(v))(t)  =  ^[W”(0.y(«))g(0]  _  °MW6(t)  +  0(||«||J)  (15) 
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From  the  definition  of  the  Frechet  derivative,  (15)  implies  that 
(Jr(v)6)(t)  =  ^[-9(t,-(^u(t))^(t))  - 


(16) 


Substituting  the  computed  derivative  (16)  and  F  given  by  (12)  into  (11)  and  rearranging  yields 

i[, (,*«),»«))  +  a,(">(^“(0i<‘*‘(0]  =  «(»*(>).  “(I))  +  ,.«■(,)  (n) 

where  6*+1(t)  =  «>*+1(0  —  v*(0-  Note  that  if  vk(0)  =  t»o  then  6(0)  =  0.  We  will  refer  to  (17)  as  the 
waveform-Newton(WN)  algorithm  for  solving  (12). 

The  above  formal  derivation  of  the  WN  algorithm  has  a  very  simple  interpretation.  The  differential 
equation  is  linearized  about  an  initial  guess  waveform  whose  value  at  time  zero  matches  the  given  initial 
condition  for  the  differential  equation.  Then,  the  guess  waveform  is  updated  by  solving  the  resulting  linearized 
differential  equation  (with  an  initial  condition  of  zero).  The  original  differential  equation  is  then  relinearized 
about  the  updated  guess,  and  the  process  repeated  until  convergence  is  achieved. 

As  WN  is  just  the  function-space  extension  of  the  classical  Newton-Raphson  algorithm,  it  will  converge 
quadratic  ally  when  the  iterated  value  is  close  to  the  correct  solution[KAN64).  The  WN  algorithm  also  has 
the  attractive  property  that  it  converges  globally  when  applied  to  equations  of  the  form  of  (1),  given  mild 
assumptions  on  the  behavior  of  the  charge  and  current  functions,  q{v)  and  g(v),  and  provided  the  initial 
guess  waveform  matches  the  initial  condition  for  the  differential  equation.  In  particular,  we  have  the  following 
theorem  about  the  convergence  of  the  WN  algorithm  whose  proof  is  given  in  the  appendix. 


Theorem  3  For  any  system  of  the  form  of  (1)  in  which  |*  is  differentiable,  LipschiU  continuous,  and  has 
a  uniformly  bounded  inverse  with  respect  to  v  for  all  u;  g  is  differentiable  and  Lipschitz  continuous;  and 
t>°(f)  is  a  continuous,  differentiable  function  such  that  v°(0)  =  vq;  then  the  sequence  of  waveforms,  {v*}, 
generated  by  the  WN  algorithm  converges  uniformly  to  the  solution  of  (l)M. 


The  iteration  equations  for  the  WN  algorithm  are  time-varying  linear  differential  equations  that  are 
easier  to  solve  numerically  than  the  original  system  of  nonlinear  differential  equations.  For  example,  if  WN 
iteration  equations  are  solved  with  the  same  discretization  techniques  as  used  for  the  classical  direct  method, 
then  since  the  equations  are  linear,  only  a  single  matrix  solution  need  be  performed  at  each  timepoint.  Also, 
linear  time-varying  systems  can  be  solved  with  a  variety  of  efficient  numerical  techniques  other  than  the 
standard  discretization  methods,  such  as  collocation  and  spectral  methods[GUA83j. 

Note  that  the  discretized  WN  algorithm  is  as  ill-suited  to  the  simulation  of  large  problems  as  the  classical 
direct  methods,  because  it  still  requires  the  solution  of  large  linear  equation  systems,  and  can  not  easily 
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Figure  I:  Resistor-Diode  Example 

exploit  multirate  behavior.  But  as  the  discretized  WN  algorithm  and  the  classical  direct  methods  are  of  such 
a  similar  nature,  it  is  perhaps  useful  to  compare  their  relative  efficiencies.  Clearly,  for  linear  problems,  the 
classical  direct  and  the  discretized  WN  methods  are  identical.  However,  the  discretized  WN  can  be  much  less 
efficient  than  classical  direct  methods  when  used  to  simulate  circuits  containing  highly  nonlinear  elements 
such  as  diodes.  This  is  due  to  the  difference  in  bow  the  classical  and  WN  methods  get  the  initial  guesses 
for  their  respective  Newton  methods.  In  the  classical  approach,  generating  an  initial  guess  for  the  Newton 
method  implies  projecting  the  behavior  of  the  system  forward  by  one  timestep.  In  WN,  one  projects  the 
behavior  of  the  system  forward  for  an  entire  waveform.  Therefore,  in  the  classical  method  the  initial  guess  is 
almost  certainly  in  or  near  the  region  of  quadratic  convergence  for  the  Newton  method,  whereas  that  would 
rarely  be  true  for  WN. 

To  demonstrate  this  difficultly  with  WN,  consider  the  problem  illustrated  in  Fig.  (1),  a  simple  resistor- 
diode  circuit  with  a  grounded  capacitor.  Fig.  (2)  shows  the  waveform  iterations  obtained  using  WN  to 
solve  the  circuit,  given  an  initial  guess  of  v°(t)  =  0  for  all  te[0, 1].  Note  that  the  first  computed  waveform 
ul(t)  is  quite  far  from  the  correct  solution,  and  subsequent  iterations  move  very  slowly  back  to  the  correct 
solution.  This  slow  convergence  is  common  when  applying  Newton  methods  to  exponential  nonlinearities, 
given  a  poor  initial  guess.  In  this  case,  over  50  iterations  are  necessary  to  achieve  satisfactory  convergence. 
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Figure  2:  Waveform  Newton  Iteration*  for  the  Resistor-Diode  Example 


A  wide  variety  of  limiting  technique*  can  be  used  to  improve  the  convergence  of  WN  in  this  caae.  For  ex¬ 
ample,  the  change  in  the  waveform  from  one  iteration  to  the  next  could  be  limited,  in  a  way  analogou*  to  *tep 
limiting  in  the  standard  algebraic  Newton  method[NAG75].  There  ia  a  well-known  strategy  that  is  particu¬ 
larly  useful  for  MOS  circuits,  where  the  only  diodes  in  the  circuit  are  associated  with  the  source- to-subetrate 
and  drain-to-substrate  junctions,  and  are  usually  reverse-biased.  Instead  of  using  the  correct  derivative  of 
the  diode  current  with  respect  to  the  voltage  across  the  diode,  the  derivative  can  be  approximated  by  the 
diode  current  divided  by  the  voltage  across  the  diode,  an  approximation  known  as  “line-through-the-origin”. 
Since  only  the  Jacobian  is  altered,  if  the  so  created  pseudo-Newton  method  converges,  it  converges  to  the 
correct  result.  The  danger  is  that  if  the  diode  is  substantially  forward  biased,  an  unlikely  event  for  MOS 
circuits,  this  pseudo-Newton  method  may  not  converge  no  matter  how  close  the  initial  guess. 

4.2  The  Waveform  Relaxation  Newton  Algorithm 

One  useful  application  of  the  WN  algorithm,  and  the  main  focus  in  this  paper,  is  to  combine  it  with  the  WR 
algorithm  to  construct  the  waveform  extension  of  the  relaxation- Newton  algorithms  presented  in  Section  3.1, 
as  in  Algorithm  2  below. 

Algorithm  2  •  (WRN  Gauss-Seidel  Algorithm) 

The  superscript  k  denotes  the  iteration  count,  the  subscript  i«{l, ...,  N} 
denotes  t be  component  index  of  a  vector  and  f  is  a  smell  positive  number, 
k  —  0. 

Guess  waveform  v°(t);  tc[0,  T] 

such  that  v°(0)  =  vq  (for  example,  set  v°(f)  =  vo,fc[0,T]); 

repeat  { 

*  -*  +  1 
for  all  (i  in  N)  { 
solve 

iil<li(vk+l  i(t),u(t))  +  -  vf(t))] 

-IA(®*+M(t),«(0)  +  -  v‘+l(t))]  =  0 

for  (»f+l(<);tt(0.r]),  with  the  initial  condition  vf+1(0)  =  t><o- 

} 

}  until  (||t;*+l  -  u*||  <  t) 
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In  the  waveform  relaxation- Newton  algorithm  (WRN)  [BOK83,  GAU83,  WBI86]  above,  the  WK  itera¬ 
tion  equation*  are  solved  approximately  by  performing  one  step  of  the  waveform-Newton  method  with  each 
waveform  relaxation  iteration.  This  is  analogous  to  the  single  Newton  iteration  strategy  of  the  nonlinear  re¬ 
laxation  methods  used  in  Iterated  Timing  Analysis.  And  like  the  WR  algorithm,  each  equation  in  Algorithm 
2  is  a  differential  equation  in  one  unknown  variable  vf,  but  in  this  case  the  nonlinear  differential  iteration 
equations  have  been  replaced  by  simpler  time-varying  linear  differential  equations. 

Given  the  global  convergence  properties  of  both  the  original  WR  and  the  WN  algorithm,  it  is  not 
surprising  that  the  WRN  algorithm  has  global  convergence  properties,  and  the  proof  for  this  is  quite  similar 
to  the  proof  of  the  WR  and  WN  convergence  theorems. 

Theorem  4  If  the  assumptions  in  Theorem  !  and  Theorem  3  are  both  satisfied,  then  the  sequence  {t>*} 
generated  bg  the  WRN  algorithm  converges  to  the  solution  of  (1)  on  ang  bounded  interval  [0,T]  ■. 

4.3  Timestep  Control  Strategy  for  WRN 

As  mentioned  above,  the  amount  of  computation  performed  in  the  early  iterations  of  the  WRN  can  be  reduced 
by  using  coarse  numerical  integration  timesteps  to  solve  the  differential  relaxation  equations  initially,  and 
then  refining  the  timesteps  as  the  iterations  progress.  Specifically,  the  first  relaxation  iteration  is  computed 
with  a  user-supplied  maximum  allowed  numerical  integration  timestep.  For  subsequent  relaxation  iterations, 
the  integration  timesteps  are  chosen  to  be  the  same  as  those  in  the  previous  iteration  unless  the  a  posteriori 
local  truncation  error  estimates  for  the  timestep  from  the  previous  iteration  is  too  large.  In  that  case,  half 
the  previous  iteration  timestep  is  used. 

To  demonstrate  this  idea,  consider  the  sequence  of  waveforms  in  Fig.  3.  For  the  first  iteration  the 
maximum  timestep  is  used,  which  produces  the  waveform  in  Fig.  (3a),  In  this  case,  the  computed  LTE  at  T 
is  too  large,  and  therefore  on  the  second  iteration,  the  window  interval  is  divided  in  half  and  two  time-steps 
are  taken,  as  shown  in  Fig.  (3b).  In  the  second  iteration,  the  LTE  is  too  large  at  time  point  T/2  but 
acceptable  at  time  point  T.  Therefore,  on  the  third  iteration,  only  the  first  half  interval  is  divided,  as  shown 
in  Fig.  (3c). 

The  iterative  refinement  strategy  has  the  advantage  that,  in  general,  the  timesteps  will  be  placed  more 
efficiently  to  control  truncation  error  than  if  the  standard  predicted  truncation  error  criteria  is  used.  This 
is  because  the  timestep  selection  is  based  only  on  more  accurate  a  posteriori  error  estimates  available  Grom 
previous  relaxation  iterations.  However,  there  are  situations  where  too  many  time  points  will  be  placed.  In 
particular,  if,  in  some  region  of  time,  a  “wavefront”  moves  through  as  the  iterations  progress  (see  Fig.  (4)), 
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Figure  3:  WRN  Time-Step  Control. 

many  timesteps  will  be  placed  in  the  wavefront’*  path.  A  way  to  overcome  this  problem  is  to  remove  time 
points  on  subsequent  iterations  if  the  LTE  is  small  at  a  time  point.  This  is  not  as  simple  an  optimization 
as  would  seem,  as  such  an  approach  can  be  unstable.  That  is,  time  points  which  are  added  on  one  iteration 
are  removed  on  the  next  iteration,  and  then  added  again  on  the  third  iteration,  etc. 

4.4  Parallelizing  WRN  with  Timestep  Refinement 

The  WRN  algorithm  combined  with  the  timestep  refinement  strategy  has  several  advantages  that  make  it  a 
good  algorithm  for  use  on  parallel  processors[SAL87B].  As  with  any  decomposition  method,  the  decomposed 
subsystems  can  be  solved  independently  on  parallel  processors{WHI86].  And  as  with  any  waveform  relaxation 
method,  solving  the  decomposed  subsystems  is  a  significant  computation  involving  numerically  integrating 
the  independent  differential  equations  over  some  interval  of  time.  Therefore,  in  comparison,  communication 
overhead  is  negligible. 

The  above  WRN  algorithm  has  an  additional  advantage  for  parallel  computation.  A*  the  discretisations 
times  are  selected  by  a  timestep  refinement  strategy,  they  are  therefore  known  s  priori,  before  beginning 
the  calculation  of  the  next  iteration  waveform.  And  since  the  iteration  update  equations  for  WRN  are 
based  on  linearisation  about  a  previous  iteration  waveform,  once  the  discretisation  times  are  known,  the 
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Figure  4:  Wavefront  Propagation  During  Iteration* 


linearised  system  at  each  of  the  discrete  times  can  be  computed  in  parallel.  The  result  is  that  not  only  can 
the  decomposed  subsystems  be  computed  in  parallel,  but  most  of  the  computation  for  each  of  the  timesteps 
for  each  of  the  decomposed  subsystems  can  be  computed  in  parallel. 

In  order  to  see  how  the  tiroestep  parallelism  can  be  exploited,  consider  using  WN  to  solve  the  simple 
equation 

*(<)  =  /(*(«))  x(0)  =  x0,  (18) 


where  x(t)  6  H",  and  m  :  IR°  — »  JRT*.  The  WN  iteration  equation  is  then 

x*+1(0  =  /(x‘(t))+^f^(**+1(f)-x‘(t))  x‘+1(0)  =  x*(0)  =  xo  (19) 

where  **(t)  is  the  kth  Newton  iterate.  Discretizing  (19)  with  backward-Euler  leads  to  a  sequence  of  Af  linear 
equations,  where  Af  is  the  number  of  timesteps.  The  equation  that  must  be  solved  to  compute  *k+1(rj)t 
where  17  is  the  time  of  the  j,h  timestep,  is  given  by 


--_-ry  i  (****(*•>}  "  *  /(*(*>))  +  -  x*(r>)). 


Reorganizing  (20)  leads  to 


*+1 


W) 


(r»-i)  + /(*(r>)) 


(21) 


where  7  is  the  identity  matrix  in  JR".  The  parallelism  that  can  be  exploited  by  simultaneously  evaluating 
Af  timesteps  can  be  seen  by  examining  (21).  Clearly,  the  evaluation  and  LU  decomposition  of  the  matrix 
on  the  left-hand  side  of  (21),  and  the  computation  of  all  the  terms  on  the  right-hand  side  of  (21)  except  the 
**  * 1  ( r;  - 1 )  term  can  be  performed  for  all  M  timesteps  simultaneously. 

Once  the  parallelizable  portion  of  (21)  is  complete,  there  is  still  some  computation  that  must  be  performed 
serially  to  update  the  x*+l(r^)  values.  This  serial  computation  involves  a  matrix  backsolve  to  compute 
x*+1(n).  which  is  then  used  to  complete  the  right-hand  side  for  the  equation  for  x*+1(r2),  from  which 
x*+l(ri)  can  be  computed  with  a  matrix  backsolve,  which  is  then  used  to  complete  the  right-hand  side  of  the 
equation  for  xk>1(ra),  etc.  In  total,  this  serial  section  involves  M  matrix  bacluolves  and  Af  - 1  multiplication 
and  additions  to  complete  the  right-hand  sides.  However,  the  bacluolves  are  by  far  the  smallest  part  of  the 
timestep  calculation,  so  this  serial  section  is  not  too  computationally  expensive,  and  therefore  processing 
timesteps  in  parallel  can  be  effective. 
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5  Program  Implementation  and  Simulation  Results 

In  this  section,  we  compare  the  basic  algorithms  used  in  three  programs,  SPLICE3.1,  based  on  ITA,  RE- 
LAX2,  based  on  waveform  relaxation,  and  a  new  program,  SPLAX,  based  on  WRN  with  iterative  timestep 
refinement.  The  two  aspects  common  to  the  three  relaxation  algorithms,  that  of  partitioning  a  large  prob¬ 
lem  into  loosely  coupled  subcircuits  and  ordering  the  subcircuits  for  the  relaxation,  is  presented  in  the  next 
subsection.  Then,  the  aspects  where  the  three  programs  differ  are  described.  In  the  last  subsection,  the 
comparison  results  are  presented  and  interpreted. 

5.1  Partitioning  and  Ordering 

Relaxation  algorithms  have  been  applied  in  several  programs  for  the  simulation  of  MOS  integrated  eir- 
cuits[NEW83,  DUM86].  These  programs  do  not  use  the  relaxation  algorithms  exactly  as  described  in  earlier 
sections,  but  employ  two  important  modifications  to  improve  the  speed  of  relaxation  convergence.  MOS 
circuits  tend  to  have  small  subsystems  that  are  very  tightly  coupled.  In  order  to  avoid  the  slow  relax¬ 
ation  convergence  that  would  result  from  trying  to  decompose  such  subsystems  into  single  equations  in  one 
unknown,  the  circuit  is  decomposed  into  blocks.  The  blocks  are  solved  using  some  form  of  Gaussian  elimi¬ 
nation,  and  the  relaxation  is  applied  to  the  block-decomposed  system.  Another  feature  of  MOS  circuits  is 
that  they  tend  to  be  directional.  That  is,  it  is  possible  to  order  the  equations  such  that  the  system  is  almost 
lower-triangular,  and  finding  this  ordering  can  accelerate  the  convergence  of  the  Gauss-Seidel  relaxation. 

Grouping  together  tightly  coupled  equations  or  blocks  in  a  system,  is  referred  to  as  partitioning  the 
system.  The  partitioning  algorithm  used  for  the  above  three  programs  is  based  on  trying  to  directly  estimate 
the  coupling  between  pairs  of  equations  to  determine  which  equations  should  be  grouped  together[WHI86]. 
In  particular,  the  algorithm  amounts  to  removing  the  coupling  between  two  nodes,  and  then  for  each  node 
replacing  the  remaining  circuitry  connected  to  each  of  the  nodes  by  Norton  equivalent  conductances  to 
ground.  The  coupling  is  then  added  back  to  the  circuit,  yielding  a  two  node  problem.  The  speed  at  which 
a  relaxation  algorithm  applied  to  the  two  node  problem  will  converge  can  be  computed  exactly,  and  if  the 
speed  is  too  slow,  the  two  nodes  are  placed  in  the  same  partition  or  subcircuit. 

Once  the  system  has  been  partitioned,  the  resulting  blocks  are  ordered  so  that  the  relaxation  is  applied 
to  a  problem  that  is  as  block  lower  triangular  as  possible.  In  some  sense,  partitioning  and  ordering  the  sub¬ 
system  of  equations  are  performing  similar  functions.  They  are  both  attempting  to  eliminate  slow  relaxation 
convergence  due  to  two  nodes  in  a  large  circuit  being  tightly  coupled.  There  is,  however,  a  key  difference.  If, 
for  example,  v<  is  strongly  dependent  on  Vj,  and  Vj  is  strongly  dependent  on  t>, ,  then  a  partitioning  algorithm 
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should  lump  the  two  nodes  together  into  one  subsystem.  However,  if  v,  is  strongly  dependent  on  vy,  but  vy  is 
weekly  dependent  on  then  node  *  and  node  j  should  not  be  lumped  together,  but  the  ordering  algorithm 
should  insure  that  the  system  is  block  lower  triangular  by  ordering  the  equations  so  that  t>y  is  computed 
before  computing  v,. 

Resistors  and  capacitors  do  not  exhibit  the  kind  of  unidirectional  coupling  that  is  of  concern  to  the 
ordering  algorithm.  In  fact,  the  only  element  type  of  concern  to  the  ordering  algorithm  is  the  transistor, 
because  it  exhibits  unidirectional  coupling.  That  is,  the  drain  and  source  terminals  of  an  MOS  transistor  are 
strongly  dependent  on  the  gate  terminal  of  the  transistor,  but  the  gate  is  almost  independent  of  the  drain 
and  source.  Therefore,  the  ordering  algorithm  used  in  our  three  programs  attempts  to  order  subsystems  so 
that  a  subsystem  containing  a  given  transistor’s  drain  or  source  is  solved  after  the  subsystem  containing  the 
given  transistor’s  gate.  If  the  circuit  contains  global  feed  back  loops,  this  may  not  always  be  possible,  and 
such  loops  are  broken  in  a  somewhat  ad  hoc  fashion. 

5.2  Differences  Between  Programs 

In  order  to  make  meaningful  comparisons  between  the  effectiveness  of  the  three  algorithms,  we  tried  to 
keep  algorithms  used  in  the  programs  as  close  to  the  same  as  possible.  For  example,  all  three  programs  are 
written  in  C,  and  as  mentioned  above,  all  three  programs  use  the  same  ordering  and  partitioning  algorithms. 
However,  the  nature  of  the  three  algorithms  do  force  certain  differences  between  the  three  programs. 

Both  SPLAX  and  RELAX2  use  waveform  relaxation-based  algorithms  and,  because  of  the  nonuniform 
way  in  which  WR  converges,  it  is  much  more  efficient  if  the  simulation  interval  [0,  T\  is  broken  into  subinter¬ 
vals,  or  windows,  [0,  tx],  [t! ,  ta]  ,  ...  ,  [tn,T].  It  is  difficult  to  determine  a  good  window  size  a  priori,  and  the 
two  programs  do  not  use  the  same  strategy  for  picking  these  windows.  In  RELAX2,  an  adaptive  windowing 
algorithm  is  used,  as  the  interaction  between  window  sise  and  converge  speed  is  reasonably  well  understood 
for  WR.  However,  in  the  case  of  WRN,  a  good  window  sise  tends  to  be  a  function  of  the  equation  nonlin¬ 
earities  rather  than  the  differential  equation  dynamics.  Since  we  did  not  have  a  good  idea  on  how  to  adapt 
window  sizes  in  this  case,  the  SPLAX  program  uses  relatively  small  window  sizes  compared  to  RELAX2. 

The  SPLAX  program  does  not  use  the  same  trapezoidal  discretization  method,  described  in  Section  2,  as 
used  in  SPLICE3.1  and  RELAX2.  This  is  because  the  iterative  timestep  refinement  technique  used  in  SPLAX 
can  use  very  large  timesteps  for  early  WRN  iterations,  and  if  the  trapezoidal  method  were  used,  it  would 
produce  spurious  oscillatory  solutions(8ee  [NAG75]).  Although  the  timestep  refinement  would  eventually 
remove  the  spurious  oscillation,  it  might  cost  extra  relaxation  iterations. 
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To  avoid  this  problem,  the  more  stable  second-order  backward-difference  method  is  used  to  solve  the  time 
varying  linear  iteration  equation  given  in  Algorithm  2.  Specifically, 

+  */).  “(‘  +*/))+  a?'(v*4-  -(<-  4  +  h,)  -  v?(t  +  A,))] 

-  *?(<))] 

+<*s [«(«‘+ll‘(l  +  ht),  «(<  +  *»))  +  ,*«(vk*'',St  +  *»>■?<«  +  *»>>(,»♦«(<  +  A»)  _  „*((  +  A»))j 
-[A(»*+M(<  +  M,«(<  +  A/))  +  df,-{V-  '(t  tt(t  +  (of ♦»(«  +  A,)  -  v?(t  +  A/))]  =  0 


where  Qri,  oj,  o3  are  functions  of  the  forward  and  backward  timestepe,  A /  and  A»  (for  the  exact  formulation, 
see  [GEA74]).  This  would  seem  to  put  the  SPL.AX  program  at  a  slight  disadvantage,  as  the  second-order 
backward  difference  method  does  have  larger  truncation  error  than  the  trapezoidal  method,  and  would 
normally  require  more  timesteps  for  comparable  accuracy.  This  does  not  aeem  to  be  the  case  however, 
perhaps  because  the  backward-difference  method  is  being  used  in  combination  with  the  a  posteriori  LTE 
control  described  above. 


5.3  Simulation  Results 


In  Table  1,  the  classical  direct,  ITA,  standard  WR,  and  WRN  methods  are  compared  using  a  number  of 
example  circuits.  The  examples  are:  a  critical  path  from  a  microprocessor  control  circuit,  Aficroc,  the  logic  for 
a  successive  approximation  register,  Scdac,  a  dynamic  memory  cross-section.  Dram,  a  static  memory  cross- 
section,  Sram,  and  a  digital  filter,  Digfi.  In  Microc,  the  waveforms  exhibit  multirate  behavior  mostly  in  the 
form  of  latency,  and  the  trade-offs  in  WR  and  ITA  balance  to  produce  similar  run  times,  but  WRN  is  faster 
than  either  method  since  it  exploits  multirate  behavior  completely  using  relatively  inexpensive  iterations. 
The  circuit  is  too  small  for  relaxation  methods  to  be  of  significant  benefit  though,  and  the  classical  direct 
method  performs  beat.  The  circuits  Sciac  and  Dram  exhibit  latency  and  are  coupled,  and  therefore  ITA 
is  faster  than  WR,  but  WRN  with  its  combined  benefits  is  again  faster  than  either.  The  WR  algorithm  is 
fastest  for  the  circuit  Digfi,  because  the  partitioner  breaks  the  circuit  into  completely  unidirectional  blocks, 
and  therefore  WR  converges  in  one  iteration.  Note  that  for  Digfi,  WRN  is  6  times  slower  than  WR,  and 
WRN  proved  to  be  even  slower  than  direct  methods!  In  Sram,  WR  is  again  the  fastest,  because  although 
there  is  some  localized  coupling  due  to  the  gate-source  and  gate-drain  capacitance  (which  are  not  included 
in  the  Digfi  example),  Sram  is  essentially  unidirectional. 

The  above  results  imply  that  WRN  is  not  very  effective  when  used  to  simulate  idealized  MOS  circuits 
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which  ignore  gate-source  and  gate-drain  capacitances.  To  provide  stronger  evidence  of  this  behavior,  Srem 
was  simulated  a  number  of  times  with  a  range  of  gate-source  and  gate-drain  capacitances,  as  controlled  by 
the  thin-oxide  thickness,  tox,  values.  As  Table  2  demonstrates,  as  the  value  of  tox  increases,  and  therefore 
the  gate-source  and  gate-drain  capacitance  decreases,  the  ratio  of  the  runtime  of  WRN  to  WR  also  increases. 

An  area  of  future  work  suggested  by  these  results  is  to  consider  an  algorithm  which  combines  both  the 
standard  WR  and  WRN  methods  in  one  simulator.  The  choice  of  which  method  to  use  for  a  given  circuit 
would  then  be  based  on  the  unidirectionality  and  linearity  characteristics  of  the  subcircuits  that  comprise 
the  circuit.  The  portions  of  the  circuit  that  contain  elements  with  highly  nonlinear  device  characteristics 
or  exhibit  predominantly  unidirectional  signal  flow  could  be  solved  using  WR  while  the  remaining  portions 
that  feature  moderate  coupling  or  weakly  nonlinear  device  characteristics  could  be  solved  using  WRN.  This 
modified  WR  method  is  expected  to  be  even  more  effective  as  it  exploits  the  advantages  of  standard  WR  and 
WRN.  However,  such  an  approach  will  require  a  method  ta  automatically  select  the  appropriate  algorithm 
for  each  portion  of  the  circuit  and  this  will  likely  be  the  key  research  activity  for  this  composite  algorithm. 


TABLE  2  -  Runtime  Ratios  of  SPLAX  to  RELAX2  as  tox  varies 


6  Conclusions 

A  new  circuit  simulation  algorithm,  based  on  combining  waveform-reiaxation-Newton  with  an  iterative  step- 
sise  refinement  scheme  has  been  described  and  a  new  convergence  proof  for  the  WN  has  been  presented. 
It  has  been  shown  experimentally  that  the  new  algorithm  is  effective  for  simulating  MOS  digital  circuits, 
and  compares  well  with  with  other  relaxation  algorithms  when  simulating  moderately  coupled,  multirate 
circuits.  It  is  not  as  well-suited  to  the  simulation  of  unidirectional  circuits,  for  which  the  standard  WR  is 
more  appropriate,  and  has  difficulty  on  certain  highly-nonlinear  problems.  However,  the  WRN  algorithm  has 
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certain  features  that  can  be  exploited  on  parallel  processors  and  this  aspect  provides  an  additional  advantage 
over  other  methods. 
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A  Proof  of  Theorem  3 


The  proof  of  Theorem  3  is  somewhat  complicated  in  detail.  To  avoid  an  unnecessarily  lengthy  presentation, 
but  still  convey  where  the  necessary  conditions  stem  from,  several  not  quite  obvious  lemmas  will  presented 
without  proof. 

The  following  norm  on  the  space  of  functions  that  map  [0,  T\  — »  ZR"  will  be  used  throughout  the  proof. 

Definition  1  The  0  norm  on  a  function  v  :  (O,!1)  — < >  52"  is  defined  as  where  B  is 

some  positive  nnmher.  The  0  norm  is  denoted  bp  ||v||b- 

Note  that  the  0  norm  is  really  a  continuous  family  of  norms,  one  for  each  positive  number  B.  On  the  space 
of  continuous  differentiable  functions  that  map  [0,  T]  — »  Rn,  the  0  norm  is  equivalent  to  the  norm  used  in 
the  WN  derivation. 

Lemma  l  ///  :  J2 "  — »  ZR"  has  a  derivative  J/  :  ZR"  — •€  J2"x",  which  is  Lipschitz  continuous  with  some 
Lipschitz  constant  lj,  then 

II /(»)  ~  /(*)  -  //(*)(*  ~  V)ll  <  i/ll*  -  Vll2  (22) 
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Lemma  1  can  be  thought  of  aa  a  remainder  theorem  for  the  Taylor  series,  the  detailed  proof  can  be  found  in 
[STO]. 

Lemma  2  Given  F(v)  as  in  (11)  and  Jp(v)  as  in  (16),  ii  follows  that 

||  J‘  F(x)( r)  -  F(y)(r)  -  JF(y)(x  -  y)(r)dr||B  <  (/,  +  ^)||*  -  y\\*B  (23) 

for  any  x,y  in  the  space  of  continuous  differentiable  functions  that  map  [0,  T)  — *  ZR",  where  l,  is  the  Lipschiti 
constant  for  yj-  and  l,  is  the  Lipschitz  constant  for  |£. 

Lemma  2  can  be  derived  by  exploiting  properties  of  the  0  norm[LEL82],  and  applying  Lemma  1. 

Lemma  3  Let  t/4,ot+1  be  two  iterates  generated  by  (17),  and  F(v)  be  as  given  in  (IS),  then 


where  lj-t  is  the  bound  on 


(If)" 


Lemma  3  can  also  be  shown  using  0  norm  properties. 

To  prove  the  theorem,  let  vk ,  and  v‘+1  be  the  k  and  k  +  1‘*  WN  iterates.  Then 

||  J'  F(vk+l)(r)  -  F(vk)(r)  -  JF(vk)(vk+l  -  v‘)(r)dr||B  <  (/,  +  ^)||ut+l  -  v*||2* 

by  Lemma  1.  By  definition  of  the  WN  algorithm, 


F(vk)( r)  -  JF(vk)(vk+l  -  vk)(r)  =  0, 


and  therefore  (25  )  can  be  reduced  to 


By  Lemma  2, 


II  jf  F(u*+,)(r)dr|l  <  (/ 4  +  ^)||v*+1  -  v*||2B. 
— -L—ll^-u^'llfl  <  II  f  F(vk)(r)dr\\B. 

ViO  +  Tf)  J° 


Substituting  into  equation  (27)  leads  to 

||v*+J  _  <  (,..l(1  +  Tk)Wi  +  ?±)\\vk+'  -  v‘||2B.  (29) 

The  theorem  follows  by  noticing  that  the  0  norm  of  ||t/*+1  -  v*||B  can  be  made  as  small  as  desired  by 
increasing  B,  because  vt+I(0)  -  v*(0)  =  0.  The  result  is  that  by  picking  a  B  large  enough,  (29)  can  be 


written  ss 


||v‘+j-v*+,||b  <  7ll«'*+I-«*lla 
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where  7  <  1  and  is  a  function  of  B,  /«- t.  14,  '*•  •*«  T.  It  therefore  follow,  that  is  a  cauchy  eequence  and 
therefore  converges  ■ 
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